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Abstract 



The need to calibrate increasingly complex statistical models requires a per- 
c/3 ' sistent effort for further advances on available, computationally intensive Monte- 

Carlo methods. We study here an advanced version of familiar Markov-chain 
Monte-Carlo (MCMC) algorithms that sample from target distributions defined 
as change of measures from Gaussian laws on general Hilbert spaces. Such a 
^»0 ' model structure arises in several contexts: we focus here at the important class 

of statistical models driven by diffusion paths whence the Wiener process con- 
stitutes the reference Gaussian law. Particular emphasis is given on advanced 
Hybrid Monte-Carlo (HMC) which makes large, derivative-driven steps in the 
state space (in contrast with local-move Random-walk-type algorithms) with 



m 

fvj . analytical and experimental results. We illustrate it's computational advan- 

tages in various diffusion processes and observation regimes; examples include 
stochastic volatility and latent survival models. In contrast with their stan- 
dard MCMC counterparts, the advanced versions have mesh-free mixing times, 

^^ , as these will not deteriorate upon refinement of the approximation of the in- 

herently infinite-dimensional diffusion paths by finite-dimensional ones used in 
practice when applying the algorithms on a computer. 
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1. Introduction 

Markov chain Monte-Carlo (MCMC) methods provide an intuitive, pow- 
erful mechanism for sampling from complex posterior distributions arising in 
applications (see [21[ for a review of algorithms and applications). The rapidly 
increasing complexity of statistical models employed by practitioners requires 
a parallel effort at advancing MCMC methodology to deliver algorithms that 
can provide fast model exploration. Ideally, suggested algorithms should be of 
enough flexibility to cover a wide range of model structures. This paper will 
move along such directions by proposing and studying advanced versions of 
standard MCMC algorithms to improve algorithmic performance on complex, 
high-dimensional models. 

The advanced algorithms are relevant for target distributions defined as 
change of measures from Gaussian laws. Within such a structure, we will be 
focusing here upon the important class of statistical models driven by Stochastic 
Differential Equations (SDEs) posing concrete computational challenges; here, 
Brownian motion constitutes the reference Gaussian measure. The paper will 
develop and test advanced MCMC algorithms for the computationally demand- 
ing task of reproducing sample paths of SDEs under various direct or indirect 
observation regimes. The ability to sample the realised dynamics driving the 
data mechanism is of high importance for understanding the model behavior. 
Also, due to the typical intractability of the likelihood function in SDE contexts, 
the underlying diffusion path is many times treated as a latent variable within 
Gibbs samplers and it's fast sampling is critical for the efficiency and feasibility 
of parametric inference procedures. 

The advanced MCMC methods follow closely recent developments [(J |6( over 
algorithms that take advantage of the structure of target distributions II being 
determined as a change of measure from a Gaussian one IIo = N(0,C), that is: 

^-( x ) = &q? {-$(x)}, (i) 

for some function $ defined on a Hilbert space T-L. The method exploits the 
relation with the Gaussian measure to evolve standard MCMC algorithms into 
advanced ones with the critical computational advantage that their convergence 
properties are mesh-free, i.e. their mixing times do not deteriorate as the di- 
mension of the state space increases when refining relevant finite-dimensional 
projections (used in practice on a computer) to better approximate inherently 
infinite-dimensional elements of T-L. In the SDE context, finite-difference meth- 
ods are commonly employed to approximate the infinite-dimensional diffusion 
sample paths whence a discretisation mesh will be specified. We will be looking 
on advanced versions of Random- walk Metropolis (RWM), Metropolis-adjusted 
Langevin algorithm (MALA) and Hybrid Monte-Carlo (HMC). MALA and 
HMC both use information about the derivative of the log-target to drive the 
algorithmic dynamics, whereas RWM uses blind proposals. Emphasis will be 
given on HMC, employing Hamiltonian dynamics, as it's nature to perform 



global designated steps on the SDE pathspace seems to provide significant com- 
putational advantages compared to the other two local-move algorithms; this 
will be illustrated both analytically and experimentally. 

A first methodological contribution of the paper will be to define HMC on a 
Hilbert space, similar to Hilbert-space valued RWM and MALA derived in [9(. 
A similar attempt for HMC has taken place in [6]; the method there required 
strict, analytical Lipschitz-continuity assumptions on the ^-derivative 5$>i\. In 
this paper, a much simpler derivation will be proven under minimal probabilistic 
assumptions on $ relevant for a much wider range of practical applications that, 
for instance, could involve inherently stochastic terms (e.g. stochastic integrals) 
at the specification of $. The concrete consequence of the mathematical verifi- 
cation of a well-defined algorithm on the infinite-dimensional Hilbert space T-L is 
an anticipated mesh-free mixing time for the practical algorithm that will run 
on some A^-dimensional projection (typically, finite-difference) of the pathspace. 
Another methodological contribution of the paper will be that it will analyti- 
cally illustrate, in the context of directly observed diffusions, that 'clever' use 
of information on the derivative of the log-target (within HMC, compared to 
MALA) can remove orders of complexity from the computational costs of the 
MCMC algorithm. Albeit proven in a linear scenario, the result will be later on 
empirically manifested in applications on realistic models. In addition to verify- 
ing the theoretical results, the empirical contributions of the paper relate with 
the implementation of the pathspace algorithms on several practical problems 
including stochastic volatility and various diffusion driven models. As demon- 
strated in relevant applications, HMC algorithms avoid the random-walk-type 
behaviour of the other algorithms and greatly outperform them, thus constitut- 
ing a powerful method for tackling high-dimensional path-sampling problems. 

1.1. Diffusion-Driven Models 

Stochastic Differential Equations (SDEs) provide a powerful and flexible 
probabilistic structure for modeling phenomena in a multitude of disciplines: 
finance, biology, molecular dynamics, chemistry, survival analysis, epidemiology, 
just to name a few (see [30( for a review of applications). A large class of SDEs 
can be specified as follows: 

dV t = n(V t ;8)dt + a{Vf,9)dB t , (2) 

for drift and diffusion coefficient pi{-,6) : R d h-> R d and o(-,6) : R d x R d i-> 
M. d x M. d respectively (of known functional form up to some unknown parameter 
9 £ M. p ), and Brownian motion B t . Such a differential modeling structure is 
sometimes implied by physical laws (e.g. Langevin equation in physics) or is 
selected due to its adaptiveness by the practitioner to represent time-evolving 
quantities. 



1 the 5- notation refers to the Frechet generalisation of differentiation on general Hilbert 
spaces; in particular, on the pathspace, under sufficient regularity, it corresponds to the notion 
of the variational derivative. 



The diffusion process denned by ^ may be observed in a number of ways. 
The case where data form a discrete skeleton of the diffusion path has been 
studied extensively (see for example @, 0, LL21 LLII), but different types of ob- 
servation regimes often appear in applications. For example in PK/PD studies 



34[ , growth curve models [14( and bioinformatics 24[ , the stochastic dynamical 
systems are observed with error. In financial applications, and in particular 
stochastic volatility models [20] , the observation model consists of partial data 



or integrals of the underlying diffusion process [12J. Event time models as in 
III \2M > involve a latent diffusion process that is observed through random bar- 
rier hitting times. In this paper we demonstrate how the above framework can 
be unified through formulations that consist of a diffusion process defined by 
([2]) and are completed by an appropriate (typically Lebesgue) density p(y\v,6) 
for the observations Y conditionally on the diffusion path V and 9. We then 
proceed and develop a general and efficient HMC algorithm to handle models 
within this framework. 

Inferential procedures about 9 are perplexed by the typical unavailability of 
the likelihood function p(y\0) in closed form. MCMC methods could in principle 
overcome such an issue via data augmentation, whereby Gibbs sampler steps 
within the algorithm would switch from sampling the unobserved diffusion path 
V conditionally on 9 and Y to sampling 9 conditionally on V and Y, Our 
advanced MCMC methods are relevant here for the challenging high-dimensional 
path-update of V. 

The construction of such data augmentation MCMC samplers has to address 
two important issues arising from inherent characteristics of diffusion processes. 
The first issue is the singularity between the parameters in the diffusion coeffi- 
cient and the latent diffusion paths (i.e. conditionally on V, the parameters in 
the diffusion coefficient have a Dirac distribution) caused by the quadratic vari- 
ation process. The problem was identified in [36] noting that the convergence 
of an MCMC Independence sampler deteriorates as 0{N) unless appropriate 
reparametrisation is applied (N being the number of discrete instances of the 
path considered in practice when executing the method on a personal computer) . 
Various such reparametrisations are currently available and some of them will 
be used in the sequel. The second issue is the construction of efficient proposals 
for the update of the (high-dimensional) latent paths. The efficiency and tun- 
ing of such updates are critical for the overall performance of the algorithms. 
Most existing approaches adopt Independence sampler steps proposing paths 
from easier-to-handle diffusion processes (typically, Brownian motion paths) as 
candidates for paths of the target V|Y, 0. While these approaches may work 
well in some cases, they can also perform unacceptably poorly in many other 
ones when target paths are very different from Brownian ones. 

The advanced MCMC samplers studied in this paper connect with both 
above issues. First, methods used for decoupling the dependence between V 
and 9, achieve this via the consideration of some 1-1 transform X = r](V;9) 
so that the distribution of X has a density w.r.t. a probability measure that 
will not involve 9: this reference measure is precisely a Brownian motion (or 
a Brownian bridge). Thus, the decoupling approaches conveniently deliver tar- 



get distributions which are change of measures from Gaussian measures of the 
structure ([1]). More analytically, using the reparametrisation method of 24] our 
advanced samplers will be, in principle, applicable under the condition: 

v n. <j{v\ 9) is invertible, for any v in the state space of V . (3) 

For the second issue, the advanced HMC sampler provides a powerful mechanism 
for delivering efficient updates for the high-dimensional latent paths as: (i) it 
will propose non-blind path updates, driven by the log-derivative of the target 
distribution; (ii) proposed updates will make large steps in the path space thus 
offering the potential for efficient mixing; (iii) it's mixing time will be mesh-free, 
i.e. will not deteriorate with increasing N. 

The paper is organised as follows: Section[5]contains some background mate- 
rial regarding Gaussian distributions on separable Hilbert Spaces. The advanced 
HMC sampler is developed in Section [3] where other MCMC samplers for diffu- 
sion pathspaces are also presented. In Section |4] we demonstrate the advantages 
offered by the HMC algorithm through an analytical study on the Ornstein- 
Uhlenbeck (OU) diffusion process. The theory developed in Section [3] is used 
in Section [5] to extend the framework the HMC algorithm to a rich class fam- 
ily of diffusion models with general diffusion coefficient and various observation 
regimes. Section [5] verifies the theoretical results and illustrates the developed 
algorithms in various simulation experiments. Further generalisations are pro- 
vided in Section [71 while Section [5] concludes with some relevant discussion and 
future directions. 



2. Gaussian Measures on Hilbert Spaces 

We collect here some background material (see e.g. [13j) on Gaussian dis- 
tributions on a separable Hilbert space Ti that will assist at the presentation 
of the later sections. The Cameron-Martin space, Ho, of the Gaussian law 
Ho = N(0,C) coincides with the image space of C 1 ' 2 . Essentially, T-io includes 
all elements of the Hilbert space which preserve the absolute continuity prop- 
erties of Ho upon translation. This is made mathematically explicit via the 
following proposition. 

Proposition 2.1. If T(v) = v + C 1 / 2 ^ for a constant vq <E H then U and 
n o T^ 1 are absolutely continuous w.r.t. each other with density: 

d{n oT~ } , ^-i/2 \ ii |2-i 
— (v) =cxp{(v ,C 'v)-±\v r}. 

Proof. This is Theorem 2.21 of [II]. □ 

For the diffusion pathspace we focus upon in this paper, the target distri- 
bution Tl(dx) is defined on the Hilbert space of squared integrable paths 1-L = 
L 2 ([0,£],1R) (with appropriate boundary conditions) for some length i > 0. The 
centered Gaussian reference measure Hq will correspond to a Brownian motion 



(thus, boundary condition x(0) — 0) or a Brownian Bridge (x(0) = x(£) = 0). 
The covariance operator is connected with the covariance function c{u, v) of the 
Gaussian process via: 



(C/)(«)= / c(u,v)f(v)dv , 
Jo 



f en 



With this in mind, the covariance operators C bm , C bb of the Brownian motion 
and Brownian bridge respectively are as below: 

(C bm f)(u)= / (uAv)f(v)dv = u / f(v)dv- / f(v)dvds; (4) 

Jo Jo Jo Jo 

(C bb f)(u) = I (uAv-^f)f(v)dv 

pt pS pU pS 

= 7 / / f(v)dvds- / / f(v)dvds. (5) 

* Jo Jo Jo Jo 

The Cameron-Martin spaces Hq" 1 and "H|f °f a Brownian motion and Brownian 
bridge respectively are analytically specified as follows (see e.g. Lemma 2.3.14 
of [11| for the case of Brownian motion; Brownian bridge involves the extra 
boundary condition x{£) = 0): 

Ho™ = {x : [0,£] M- R : 3 f E L 2 ([0,£],M.) such that x(t) = \ f(s)ds} ; 

J[0,t] 

n b Q b = {x : [0,1] h-> R : 3 / e i 2 ([0,^],M) such that &(*) = / /(s)ds, i(f) = 0} 

J[o,t] 

The Karhunen-Loeve representation of N(0, C) will be used later on. Analyt- 
ically, considering the eigen-decomposition {A p , 4> p } P K Li of C so that C 4> P = A p </> p , 
we have that the (normalised) eigenfunctions {4>p\ < ^ = i constitute an orthonor- 
mal basis for the Hilbert space "H. In particular, for x ~ N(0,C) we have the 
expansion: 

oo oo oo 

x = ^(z, (j> p )<j)p =^2x p (f)p =^yAp"C P 0p i (6) 

p=l p— 1 p— 1 

where {^pjp*^ are iid variables from jV(0, 1). 

3. Advanced HMC on Hilbert Space 

HMC uses the derivative of the log-target density, in the form of Hamil- 
tonian dynamics, to generate moves on the state space. An appropriate ac- 
cept/reject decision will then force reversibility w.r.t. the target distribution. 
Uniquely amongst alternative MCMC algorithms, HMC allows for the synthe- 
sis of a number of derivative-guided steps before the accept/reject test, thus 
permitting large, designated steps in the state space. 



The standard version of HMC first appeared in [15(. An advanced version 
for infinite-dimensional targets of the form ([I]) appeared in _|6| . One of our ob- 
jectives here is to extend the relevance of the method in [6j] to a much wider 
class of SDE-driven applications. In particular, [6[ defines an HMC algorithm 
on a Hilbert space under strong analytical Lipschitz-continuity assumptions on 
Sobolev norms of <5$ which are difficult to verify and are, indeed, not relevant 
for the common scenario when $ will involve inherently stochastic terms (like 
stochastic integrals). Here, we provide an alternative derivation of advanced 
HMC which avoids such strong assumptions of earlier works, and is relevant 
for many practical applications. Such a construction of the algorithm on the 
Hilbert space will provide a justification for order C(l)-mixing times for the 
related TV-dimensional projected algorithm. This justification will not be ana- 
lytically proven here as it would require a level of mathematical technicalities 
far beyond the application-motivated scope of this paper; we note that all nu- 
merical applications shown later will empirically verify the C(l)-mixing times 
of the samplers. 

We will only require the following condition (recall that C is the covariance 
operator of the Gaussian measure at the definition of the target n in (JTJ) ) : 

Assumption 3.1. C 8$>{x) is an element of the Cameron- Martin space of the 
Gaussian measure n (so C 8$>(x) G ImC 1 ' 2 ) for all x in a set with probability 
1 under n . 



We will discuss Assumption 13.11 in the context of our applications in the 
sequel. Note that the final algorithm suggested later in this section coincides 
with the one presented in [6( when the strict analytical conditions of that paper 
on <f> are satisfied; the new theoretical development here is to construct an 
alternative proof for the well-definition of the algorithm on the Hilbert space 
that requires only Assumption 13.11 and, as a consequence, greatly extends the 
relevance of the developed method for applications. 

Some of the calculations carried out in the development of the algorithm in 
the sequel make explicit sense for H = R w and only formal sense when % is an 
infinite-dimensional separable Hilbert space. We can write (formally in the case 
of infinite-dimensional %): 

U(x) ex exp{-$(x) - \{x,Lx)} , (7) 

where we have defined L = C _1 . 

3.1. Hamiltonian Dynamics 

The state space is now extended via the introduction of an auxiliary 'velocity' 
v € H; the original argument x G W can be thought of as 'location'. We consider 
the 'total energy' function: 

H(x,v;M) = $(x) + Ux,Lx) + Uv,Mv) , xeH, (8) 



for a user-specified 'mass' operator M such that M l is a well-defined covariance 
operator on T-L. We define the distribution on T-L x %: 

U(x, v) oc exp{-H(x, v; M )} = exp{-$(a;) - | (x, La;) - § (u, Mv)} . 

Note that under II(a;,i>) we have v ~ iV(0, Af^ 1 ). The Hamiltonian dynamics 
on "H = I 1 *' are defined as follows: 

dx 1 dH j.,t^ v ®H 

dt dv ' dt dx ' 

or, equivalently: 

— =v, M — = -Lx - <5$ (x ). (9) 

dt dt 

Hamiltonian equations preserve the total energy, and in a probabilistic context 
they have n(ir, v) as their invariant distribution. The proof (for the case T-L = 
M. N ) under regularity conditions (see e.g. [la]) is straightforward and based 
on the fact that the solution operator of (|9]) is volume-preserving and energy- 
preserving. One motivation for the selection of mass matrix M could be that 
directions of higher variance under the target Tl(dx) should be given higher 
marginal velocity variances and vice versa. In particular, selecting M = L seems 
optimal in the case when $ = and the target distribution is simply N(0, C), as 
we effectively transform the target to N(0, 1) and equalise all marginal variances. 
Caution is needed when % is an infinite-dimensional Hilbert space. Following 
[6] , one now is forced to choose the mass matrix M = L to end up with a well- 
defined algorithm. In this case the energy function will be: 

H(x, v) = $(x) + \ (x, Lx) + | {v, Lv) , x e H , (10) 

and the Hamiltonian equations will become: 

— =v, — = -x-C6$(x). (11) 

dt dt w v ; 

Showing in infinite dimensions that a solution operator of (111[) exists and pre- 
serves II(x, v) is harder: an analytical proof is given in [6], under carefully 
formulated assumptions on C and $. As the differential equations in (ITU can- 
not be solved analytically, HMC solves them numerically and then uses an 
accept/reject step to correct for violating preservation of total energy (and in- 
variance of II(a;, v)). 

3.2. Standard HMC onH = R N 

The standard HMC algorithm developed in [15| discretises the Hamiltonian 
equations © via a leapfrog scheme (we work under the selection M = L): 

Vh/2 = vo - | x - | C 6$(x ) , 
Xh = x + hv h/2 , (12) 

Vh = v h/2 -\xh-\ C8<b(x h ) , 



giving rise to an operator (#o,Vo) H> {xh,Vh) = VViOeoj^o) which is volume- 
preserving, and has the symmetricity property iph{xh, —Vh) = (%o, — Vo)- HMC 
looks at Hamiltonian dynamics up to some time horizon T > 0, via the synthesis 
of 

I=Vl\ (13) 

leapfrog steps, so we define tp^ to be the synthesis of / mappings iph- The stan- 
dard HMC is given in Table [1] (P x denotes projection on the x-argument). Due 
to t he p roperties of the leapfrog operator mentioned above, it is easy to ver- 
ify ([la]) that under regulatory conditions the employed acceptance probability 
provides Markov dynamics with invariant distribution Tl(x) in ([!]). 



HMC onR N : 

(i) Start with an initial value x^- ' £ M. N and set k = 0. 
(ii) Given x^ k > sample v^ k > ~ N(Q,C) and propose 

(Hi) Calculate the acceptance probability 

a = lAexp{-Aif(a; (fe) ,u (fe) )} (14) 

for AH = Hiipfav)) - H(x,v). 

(iv) Set x^ k+1 ' — x* with probability a; otherwise set x^ k+1 ' = x^ k ' . 

(v) Set k — > k + 1 and go to (ii). 

Table 1: HMC on R N , with target II (as) in (Q. 



3.3. Advanced HMC on H 

Applying the standard algorithm in Tabled] on an iV-dimensional projection 
of the SDE pathspace would give an algorithm for which: the proposal x* would 
become an increasingly inappropriate candidate for a sample from the target 
with increasing N ([6]); thus, the acceptance probability would vanish with 
increasing N, assuming parameters h, T were kept fixed. Indeed, employing 
the mapping ipi f° r elements of the pathspace would project Brownian motion 
paths to paths of the wrong quadratic variation which would then necessarily 
have had acceptance probability (see [9| for an analytical illustration in the 
case of MALA); the results in [8| suggest that one must decrease the step-size 



h as 0(N~ 1 / 4 ) to control the acceptance probability for increasing N. The 
advanced HMC algorithm avoids this degeneracy by exploiting the definition of 
the target as a change of measure from a Gaussian law. The development below 
follows [6] ; a similar splitting of the Hamiltonian equations as the one showed 
below is also used in [5j, but in a different context. 

Hamiltonian equations (fTTj) are split into the following two equations: 

|=0, £--«•(,); (15) 

dx dv 

dt dt v ' 

Notice that both equations can be solved analytically. We construct a numerical 
integrator for (jTTJ) by synthesizing steps on (TT5j) and (fT5)) . Analytically, we define 
the solution operators of (fT5j) and (fT6)l : 



H t (s,w) = (s J t;-«C«*(a!)); (17) 

Ht(ar, v) = ( cos(t) a; + sin(i) u, — sia(t) x + cos(t) v) . (18) 

The numerical integrator for (fTTj) is defined as follows: 

*/. = s h/2 ° 2h» o S h/2 , (19) 

for small h, h* > 0. We can synthesize steps up to some time horizon T . Defining 
/ as in (|T5|) . ^ will correspond to the synthesis of / steps ^h- ^i will provide 
the proposals for the MCMC steps. 

Remark 3.1. Critically, operators S t (a;,i>) ; S t (x,f) have the property that they 
preserve the absolute continuity properties of an input random pair (x, v) dis- 
tributed according to the Gaussian law: 

Qo(x,v) oc exp{—^(x,Lx) — ^(v,Lv)} , (20) 

(so, also of any other distribution absolutely continuous w.r.t. Qo). This is 
obvious for H t (a;, v) as it defines a rotation, so this map is in fact invariant 
for Qq. Then, as illustrated with Proposition \K7\ A s sumption 1 3. l\ guarantees 
precisely that also S t (a:, v) preserves absolute continuity of Qq. 



We will use h* such that: 



cos(h*)= 1 T ^, (21) 



though any choice is allowed. For this choice, it can be easily checked that the 
integrator (xq,vq) <-t ^h(xo,Vo) ='■ (xh,Vh) can be equivalently expressed as: 

v h/2 =v Q -l^±^-ICS^(x ), 
%h =x Q + h v h/2 , (22) 

h Xq + Xfr . 

vh = v h/2 - f qC 53>(x h ) , 
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which can now be interpreted as a semi- implicit-type integrator of ([TTjl . Under 
the interpretation (|22j). the justification for the choice (J2TJ) is that it delivers an 
integrator ^h that carries out steps of similar size, h, in the x and v directions, 
in accordance with standard HMC. 

The complete algorithm is determined in Tabled 



HMC on Hilbert space "H: 

(i) Start with an initial value x^ ' ~ IIo = N(0,C) and set k = 0. 
(ii) Given x^' sample v^' ~ N(0,C) and propose 

x* = r x ^{(^ k \v^) . 

(in) Consider 

a=lAex.p{-AH(x (k \v {k) )} (23) 

for AH = H^lfav)) -H(x,v). 
(iv) Set x^ k+1 > — x* with probability a; otherwise set x^ k+1 > — x*- k > . 
(v) Set k — » k + 1 and go to (ii). 

Table 2: HMC on H, with target Tl(x) in Q. 



Remark 3.2. XTie acceptance probability in the table is at the moment defined 
only formally, as H{x, v) = oo a.s.. To see that, notice that using the Karhunen- 
Loeve expansion in (0|) for x ~ IIo we have (x, Lx) = X^p=i £»? / or £p ^ -^(0, 1)- 
We re-express the acceptance probability in the following section in a way that 
illustrates that the difference AH = Hfif^X, v)) — H(x,v) is a.s. well-defined; 
from a practical point of view, for the N -dimensional projection used in practice 
one could still use directly the expression AH = H (^^(x, v)) — H(x,v) as each 
of the two H -terms will grow as 0(N). 

Remark 3.3. We will not prove the existence of a solution for the Hamiltonian 
equations on Hilbert space H5\) - D^\) or that the solution would preserve IL(x,v) 
as such proofs would require a level and amount of technicalities out of the scope 
of the paper. In Section \3.4\ below we will prove the validity of the algorithm 
in Table [H which uses directly the numerical integrators of these equations in 
\1T) - $TB) . This seems to suffice from a practical point of view: our proof be- 
low indicates that the algorithm will not collapse as N — ► oo but will converge 
to a limit, with N being the dimension of the vector used instead of complete 
infinite- dimensional diffusion paths when running the algorithms on a personal 
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computer; then, for a fixed finite dimension N one can resort to properties of 
finite- dimensional Hamiltonian equations to justify that, under standard regula- 
tory conditions, they will indeed preserve the N -dimensional target distribution, 
thus one can attain average acceptance probabilities arbitrarily close to 1 by 
decreasing the step-size h. 

3.4. Proof of Validity for Advanced EMC 

We define the operator ^l\ which is as ^ but with the non-linear parts 
set to zero, that is $ = 0. We consider the Gaussian product measure Qo = 
N(Q,C) <g> N(0,C) on H x H as in fl2DJ| and the bivariate distribution Q via the 
change of measure: 

Q(dx, dv) — exp{— $(x)} Qo(dx, dv) . 

We also consider the sequence of probability measures on 7-L x %: 

Q {i) = Q o *-< , 1 < * < / 

the sequence (xi,Vi) — ^1(xq,vq), and set: 

g{x) := -C 1/2 6<f>(x) , X€H. 



Note that under Assumption 13.11 g(x) is a well-defined element of the Hilbert 
space T-L a.s. under IIo. Using Proposition 12. 1[ we can now prove the following: 

Proposition 3.1. We have that: 

dQ« dQ^) . 1/2 

dQo dQ 

where we have defined: 

G(x,v) = exp{(^g(x),C- 1 / 2 v)-^g(x)\ 2 } . 



Proof. We will use the chain rule and Proposition 12. II Recall that for any two 
measurable spaces (E,£), (£",£'), probability measures M, Mq on (E,£) and 
1-1 mapping F : (E,£) M- (E',£' ), we have the following identity rule for the 
Radon-Nikodym derivative: 

We now work as follows. Following the definition of ^ from (fl9|) . we have the 
equality of probability measures: 

q« = q(- i )os-; 2 o^, i os-; 2 . 
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Thus, we have that: 

!^ iXi > Vi) = dQo (Xl ' Vl) 

d{Q^o~-j 2 oZ-}o~-} 2 } d{Q oE 7i } 2 } 

= 777; — =n {xi,vi)x — {xi,vi) 

d{Q o^ h/2 \ dQ 

= dQ~ Q &h/2( x i> v i)) x Gfa>Vi) 5 (25) 

we have used the chain rule in the second line, then (f24| and Proposition 12.11 
(in this case vq = f g(x)) in the third line. Using the fact that Qo o E~^} = Q 
and that also (3^* o 37, 2 )(:Ej, Uj) = 2^/2(^1-1,^-1) we have that: 



(E h j 2 (xi,Vi)) = — (E h/2 (xi-i,Vi-i)) 



Finally, working as in (|23|) we have that 



dQo 



(S A/2 (xi_i,Vi_i)) = 

d^-i) d{Q os-; 2 } 

= —^ (Xi-l,Vi-l) X — (Z. h / 2 (Xi-l,Vi-l)) 

dQo dQo 

= —37: — (ajj-i.Ui-i) x G(E h / 2 (xi-i,Vi-i)) . 
dQo 

The definition of3 h /2 gives ©(S^CaJt-i, u»-i)) = G{x i ^ ll v l ^ 1 + ^C l / 2 g(x i -i)). 
Following the calculation from (1251) we have now proven the requested result. 

D 



Thus, using Proposition 13.11 iterativelv we have now obtained that: 

^^(x I ,v I ) = ^{x Q ,vo)xT{G{xuv i )G{x i -i,v i _ l +±C 1 ! 2 g{x i - X )) . (26) 
dQo dQo f^ 

Now, using definition (| 19[) for any h, h* > we can show the following identity 
holds: 

G(xi,Vi) Gfa-uVi-! + I C^V^i-i)) = 

= \ (xi,Lxi) + \ (vi,Lvi) - \ {xi-i,Lxi-i) - \ (ui_i,£t;i_i) . 

Thus, we can rewrite (l26l) as follows: 



-£~-(xi, vi) = exp{Ai?(x , wo) - *(a*)} ■ (27) 

«Wo 

The above expression will be used at proving the main result below. 
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Remark 3.4. The operator ^>h (thus, also ^ h ) has the following properties: 

i) ^h is symmetric, that is ^h o S o ty h = S where S(x, v) — (x, —v). 

ii) if?h is (formally) volume-preserving, as it preserves volume when H = R . 

Theorem 3.1. The Markov chain with transition dynamics specified in Tabled 
has invariant distribution Tl(x) in [Ity. 

Proof. Assuming stationarity, so (xo,i>o) ~ Q> we can wr he for the next posi- 
tion, x' , of the Markov chain: 

x' = I [ U < a(*- / (x / , vi)) ]xi+I[U> a{x 0l v Q ) ] x , 

for a uniform random variable U ~ Un [0, 1]. Let / : % t-> R be bounded and 
continuous. We need to prove that: 

E[/(O]=E[/(x )]. 

Integrating out U from above we get: 

E [ /(a/) ] = E[ f( Xl ) a(x , v )]-E[ f(x ) a(x , «o) ] + E [ f(x ) } ■ (28) 

Note now that: 

E[f(xi) a{ X() ,vo) } = E(j W [ f(xi) a(^ 7 (£/, v r )) } 

^E Qo [f(x I )a(^ I (x I ,v I ))e^n I (^))-^r) ] 

= E Qo [ f( Xl ) ( 1 A e * W(x,,w)) ) e -»(x.) ] 

= E Q [/( l7 ).lAe M ^'< 1 '^] 

= E <}[/(«/) • 1 A e Aff(*^(^-^)) ] _ (29) 

Now, that due to the symmetricity property <&£ o S o \p£ = S of the leapfrog 
operator in Remark 13.41 we have that \l/^ o S = S o \l>£. Thus, we have: 

Ai7 (vl/- 7 ^, - WJ ))) = Aff (5 o ¥ h ( XI , vr))) 

= H(S(xr, vr)) - H(S o V{(xr,vr)) = -AH( qi , Vj ) , 

where is the last equation we used the fact that H o S = H due to the energy 
H being quadratic in the velocity v. Thus, using this in (|29|) , we have that: 



E[f(xr)a(x ,v a )] =E Q [f(x I )a(x I ,q I )} = E[f(x )a(x ,v a )] . (30) 

So, from (|28l) . the proof is now complete. D 



Remark 3.5. The demonstration of validity of standard HMC J la] does not 
require the recursive calculation of the forward density |^7| ) as it exploits the 
preservation of volume (unit Jacobian) for the mapping (xq,vq) h- > tp^XOfVo) 
to directly prove the analogue to J30\) . So, finding \27ty overcomes the difficulty 
of making sense of a Jacobian for the transform \&£ on the infinite- dimensional 
Hilbert space. 
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3.5. Advanced MCMC Samplers on Pathspace 

A number of MCMC algorithms corresponding to an upgrade of standard 
RWM, MALA and HMC on the infinite-dimensional pathspace are now avail- 
able. So far in Section [3] we have defined HMC on pathspace under Assumption 
13.11 In |9| the definition of advanced MALA on the pathspace is provided; also 
advanced RWM on pathspace was defined in that paper. We now briefly review 
these two local-move algorithms. 

The starting point for MALA is a Langevin SDE with drift ^CdlogIl(x) 
and coefficient C 1 ' 2 , that is, after a calculation on the drift: 

** i x _i C «&(*)+ C 1 /*^. (31) 

dt 2 2 w dt K ' 

In an Euclidean setting {wt} denotes a standard Brownian motion, whereas 
in the pathspace it denotes a cylindrical Brownian motion. In both cases, 
the process can be easily understood via the distribution of it's increments, 
as C l / 2 (wt+ y s Wt) ~ N(0,C). On pathspace, the SDE (2D is shown in [9fl to have 
invariant distribution LT under Lipschitz continuity and absolute boundedness 
assumptions on 5&. In the practically interesting case of nonlinearity, this SDE 
cannot be solved analytically. So, a proposal can be derived via the following 
Euler-type scheme on ([31]) for an finite increment At > 0: 



x* -x = -At(6^ + (l-0)%)-^C5<S>{x) + VAiN{O,C) . (32) 

Standard MALA is derived from an explicit Euler scheme with 8 — and ad- 
vanced pathspace MALA from a semi-implicit scheme with 8 = 1/2. Contrasting 
(j52")l with the leapfrog steps in (jT2"j) and (|2"2"j) . one can easily check that standard 
(resp. advanced) MALA is a particular case of standard (resp. advanced) HMC 
when choosing h = y/At and a single leapfrog step 7=1. 

Finally, a RWM algorithm on pathspace is derived in [9( via proposal (|32|) 
for 8 — 1/2 but also with omitting the nonlinear term C5$>(x). That is, the 
proposal for advanced RWM is: 



x* =px + y/l-p 2 N(0,C) , 

with parameter 

i _ At 

The Metropolis-Hastings acceptance probability for this proposal (see [9() is 
reminiscent of the one for standard RWM, namely 1A jfey , which also explains 
the interpretation of this algorithm as 'advanced RWM'. Table [3] summarises the 
three pathspace samplers looked at in this paper together with their standard 
versions for finite-dimensional spaces. 
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Algorithm 


Pathspace Proposal 


Standard Proposal 


HMC 

MALA 

RWM 


x* =V x ^ I h {x,v) 

x* = px + ^/l- p 2 v- ^C5${x) 

X* = p X + y/l — p 2 V 


x* = V x ^l{x,v) 

x* = (l- %) x -%C 5${x) + s/Aiv 

x* = x + %/Ai v 



Table 3: MCMC algorithms on pathspace together with their standard versions. In all cases 
v ~ N(0,C). HMC for 7= 1 and h = VAt coincides with MALA. 



4. HMC Superiority in an Analytical Study 

In the applications that we show in the next sections HMC appears to be 
much more efficient than MALA, even if they both use the same information 
about the target distribution in the form of the derivative <51ogn. So, the syn- 
thesis of deterministic steps for HMC, by avoiding random- walk- type behaviour 
for the Markov chain dynamics, seems to be providing significant computa- 
tional advantages for HMC. We will illustrate this analytically in this section for 
the case of a linear target distribution corresponding to an Ornstein-Uhlenbeck 
(OU) diffusion process. In particular, we will show that HMC gains orders of 
complexity compared to MALA and RWM when MALA itself does not gain a 
complexity benefit over RWM. So, an important message derived here is that 
designated use of information on the derivative can have significant effect at the 
efficiency of such MCMC methods. 

We will consider the OU bridge: 



dX t = -nX t dt + dB t 
X = Xg = , 



(33) 



for reversion parameter fc > and path-length £ > 0. From Girsanov's theorem 
(|33j) we get that the target distribution is defined on the Hilbert space % — 
L 2 ([0,£],M.) and is expressed as in the general form ((TJ with n = N(0,C bb ) the 
distribution of a Brownian bridge with x(0) = x(£) = and negative log-density 
<fr(x) = \ L x 2 dt + c for some constant c£R. We will look at the complexity 
of pathspace samplers as a function of the length £ of the bridge. Our main 
result summarises the mixing times as follows: 



RWM : 0{£ 2 ) 

MALA : 0(£ 2 ) 

HMC : 0(£) 



(34) 



The notion of mixing time is used here in an informal, practical manner and 
should not be confused with analytical definitions of various different versions 
of mixing times appearing in the Markov chain literature. In particular, the re- 
sults below obtain appropriate scalings of the step-sizes for the relevant MCMC 
samplers as a function of £ that deliver non- vanishing acceptance probabilities 
as £ grows. Then, informal arguments will be used to connect mixing times with 
the inverse of such step-sizes. 
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4.1. Acceptance Probability for RWM, MALA, HMC 

The proof in Proposition 14.11 below follows closely the derivations in the 
PhD thesis .39], where a similar scaling problem has been considered over the 
reversion parameter k. We have decided to include an analytical proof in the 
Appendix for reasons of completeness, but also because we have made some 
modifications that make it easier for the reader to follow the derivations. 

Karhunen-Loeve expansion for Brownian Bridge and OU Bridge 

The Karhunen-Loeve expansion (see Section^ of the Gaussian distributions 
corresponding to the target OU bridge and the reference Brownian bridge will 
be used in this section. In particular, we will use the orthonormal basis {4> p }p*^i 
of H corresponding to the eigenfunctions of C bb and make the standard corre- 
spondence x h-> {x p \ p < L l between an element x £ H and it's squared summable 
co-ordinates x p = (x, 4> p ) w.r.t. the basis {4> p }. In par ticular, the eigen-structure 
{^p>0p}^Li oi C bb is specified as follows (see e.g. |39|): 

£ 2 , . /2 . ,irpu 

X P = ^2p2' M u ) = \jj sm( — ) . (35) 

Then, the Karhunen-Loeve expansion of the two Gaussian distributions w.r.t. 
to the above basis of sinusoidals is as below (see e.g. 39]): 



°° £ i 

BB: x = > — £„ 4> p ; OU Bridge: x = > — £„ (j> v , (36) 

— ' 7TV — ' l ~1~1 

p=l y p=l 

where {^ p }^! are iid variables from ^(0, 1). 

Proposition 4.1. Consider the advanced MALA and RWM algorithms de- 
scribed in Table \3\ with target distribution H in i33\). If a = a(x,v) is the 
acceptance probability of the proposal for current position x and v ~ N(0,C bb ), 
then in stationarity (x ~ nj we have the following: 

• If At = c/£ 2 for some constant c > 0, then limsup^E [a] > 0. 

• If A.t — c/£ e for e £ (0,2) and a constant c > 0, then lim£_ s . 00 E [a] =0. 

Proof. See the Appendix. □ 

We will now derive a corresponding result for HMC. Recall that the step-size 
for HMC is denoted by h instead of At. We work as follows. Each leapfrog step 
^ht defined via (fP9|) . (|21|) . can in this case be written as a linear operator: 

p - (1 - p)K 2 C bb 

I-( P -(l-p)K 2 C bb Y 
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V h .= 




where we have set here: 

So, the mapping ^h above, can be equivalently expressed in terms of it's effect 
to the pth co-ordinates x p , v p of x, v respectively w.r.t. the orthonormal basis 
{4>p} in (f3"5j) as follows: 

P~ (1 -p)n 2 \ p 




Powers of the above matrix will be determined via it's eigenstructure. We will 
only consider the case when there will be complex eigenvalues, i.e. when: 

\p-(l-p)n 2 \ p \ < 1 , 

as in the alternative scenario there will be an eigenvalue of modulus greater 
than one (since the Jacobian of the above matrix is unit) whose powers will 
explode rendering the algorithm unstable. The above is equivalent to requiring 
that (4 — fa — -i 1 )/(4 + jt) lies in (— 1, 1), which can be easily seen to be 
guaranteed, for any £ > £q > and for all p > 1, under the condition: 

ck<2it. (37) 

This condition specifies the region of stability (see e.g. [32]) for the discretisa- 
tion scheme of the Hamiltonian dynamics in our context. Under (|37[) , we can 
conveniently write: 

vj; , i cos(<9 p ) a p sin(6> p ) \ / cos(I9 p ) a p sm{I9 p ) s ,._, s) 

''■" ! -^sin(0 p ) cos{9 p ) J ~\-±MI9p) cos(I9 p ) 

where we have set: 



co S (6 p ) = p - (1 - p) k 2 \ p ■ sin(0 p ) = ^l-cos2(^) ; a p = ^^ . (39) 

We can now derive the following result. 

Proposition 4.2. Consider the advanced HMC algorithm in Table\Mwith target 
distribution IT in A33\) . If a = a{x, v) is the acceptance probability of the proposal 
for current position x and v ~ N(0,C bb ), then in stationarity (x ~ H) we have 
the following: 

• If ' h — c/£ with en < 2it then lim inf^ E [ a ] > 0. 

Proof. See the Appendix. □ 
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So, under the selection h = c/t, the average acceptance probability is lower 
bounded by a constant for arbitrarily long bridges. 

Remark 4.1. We could now informally connect the above step-sizes that control 
the average acceptance probability for the advanced RWM, MALA and HMC 
algorithms with their mixing times that will involve their inverses, as stated 
in \34ty . One could think of the effect of the proposal of each algorithm for 
increasing £ on a fixed time-window for a path, say on [0,^o] for some £q > 0. 
For HMC, the synthesis of I — [tJ leapfrog steps will give a proposal moving 
the whole sub-path on [0,^o] an 0{\)-distance within it's state space. To show 
that, we ignore for a moment the effect of the nonlinear map E^/2 oi the the 
leapfrog update in U9\) and focus on the synthesis of I linear maps S^» . This 
gives: 






cos(Ih*) sin(Ih*) \ / cos(T) sin(T) \ ^ £ ^ ^ 

-sin(J7i*) cos(Ih*) J \ -sin(T) cos(T) / 



The effect of the nonlinear operator R h / 2 does not have a similarly simple inter- 
pretation, but should not offset the main effect of proposals making 0(l)-steps 
from a current position, for arbitrarily large t. Thus, as a function of i, the 
mixing time for advanced HMC only corresponds to the order of the number of 
leapfrog steps, 0(£). For advanced RWM, shown in Tabled for At — c/£ 2 we 
can express the proposal as: 

x* = (1 + <D(r 2 )) x + ^(l + 0{r 2 )) £ . (40) 

Here, due to the random-walk nature of the proposal, the algorithm will have to 
synthesize 0{£ 2 ) -steps to move 0(1) -distance from a current position for a fixed 
point of the sub-path in (0, £q], thus the 0{£ 2 )-mixing time. Finally, for MALA, 
one has to refer to the interpretation of the algorithm as a discretisation of an 
SDE on the pathspace, expressed in \31\) : without being rigorous here, advanced 
MALA essentially carries out steps of size At = 0{£~ 2 ) along the continuous- 
time dynamics, thus will require 1/ At = 0{£ 2 ) steps to propagate a point of the 
sub-path on [0,^o] on 0(l)-distance from it's current position. 

Of course, a rigorous analysis of mixing times would involve characterising 
the eigenvalues of the Markov chains, but this is beyond the scope of this paper. 

5. Advanced MCMC for Diffusion-Driven Models 

In this section we return to the general framework of Section 11.11 that can 
accommodate a wide range of applications. Recall that the statistical model here 
is driven by the solution V of the SDE @ involving some unknown parameter 9; 
then, conditionally on V and 9 there are some observations Y via the Lebesgue 
density p(y\v,9). We will now apply the previously defined advanced MCMC 
samplers in the context of such diffusion-driven models to efficiently sample from 
the path- valued distribution p(v\y, 9). Such models provide a generic framework 
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where target distributions defined as a change of measure from Gaussian laws 
as in ([T]) will naturally rise due to the probabilistic dynamics being driven by 
Brownian motion. We will first give a brief description of inferential issues 
related with such models, before we show details over the applicability of the 
advanced MCMC methods in such a context. 

The method in [36JJ succeeds in breaking down the singularity between V 
and the parameters in the diffusion coefficient c(-;#) through appropriate use 
of the Lamperti transform; see 27 1 and [28| for extensions. This framework 



covers the case of reducible diffusions, i.e. SDEs that can be transformed via an 
1-1 map to ones of unit diffusion coefficient. Now, [29[ offers some extensions 
to irreducible diffusions utilizing time change transformations; nevertheless the 



most general framework for irreducible diffusions is offered from [12], |2J] requir- 
ing only invertibility of the diffusion coefficient. For ease of exposition we will 
first consider the case of univariate (d = 1) reducible diffusions and work with 
the reparametrisation recipe of [36|. However, advanced HMC algorithms can 
also be defined under ((3|) with the approach of [24|, |l2|; details are postponed 
until Section [7] 

For reducible diffusions we can apply the Lamperti transform on the original 
SDE model ©, i.e. V t — > ri(Vt;9) =: X t where r](-;8) is an antiderivative of 
<T~ l {-]6). Assuming that &{-;9) is continuously differentiable, an application of 
Ito's lemma provides the SDE of the transformed diffusion X as: 

dX t = v{X t ;6)dt + dB t , X = r)(V ',9) , (41) 

where 

We must center the reference Gaussian law, so henceforth X will correspond to 
X — Xq. Girsanov's theorem gives that (for LTo being the standard Brownian 
motion on [0,^]): 

— (x\6)=exp\ v{x{s)+x{Q)-6)dx{s)-\ W {x{s) + x{0); 0)\ 2 ds] . 
"Ho Wo Jo > 

Thus, for the distribution of X conditionally on the data Y we have that: 

^( X \ y ,8)cxp(y\r,-\x ] 6),d)^-(x\e) 

cxexp{-$(x ;2 /,0)} , (42) 

where we have defined: 

$(x;y,6) = - v{x{s)-0)dx{s) + \ u 2 (x(s);9)ds 
Jo Jo 

-logp^ltj-^a:;^^). (43) 

Next, assuming a generic structure for p(y\v, 9) relevant for practical ap- 
plications, we will verify Assumption 13.11 and therefore demonstrate that our 
advanced HMC method is well-defined for such models. 
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5.1. Calculation of C 5<&(x) 

Motivated by the expression in (|4"5)l for $>(x\y,8) and the structure of the 
data density p(Y\V,6) arising in applications, we will carry out calculations 
assuming the following general form: 

*(a:) - a(a?(ti), x(t 2 ), ..., x(t M )) + P{h,h, ...,Il) + l(Si,S 2 , ..., Sj) (44) 
where we have set: 

h = I zi(s,x(s))ds , 1 < I < L ; Sj = rj(s,x(s))dx(s) , 1 < j < J , 
Jo Jo 

for positive integers M,L, J, times t\ < t% < ■ ■ ■ < tu in [0,^] that could be 
determined by the data Y and functions a,p,j,Zi,Tj determined via the partic- 
ular model. Explicit instances of this structure will be provided in the example 
applications in the sequel. Here, the target posterior distribution H(dx) is de- 
fined on the Hilbert space of squared integrable paths % — L 2 ([0,£],R) (with 
appropriate boundary conditions) . The centered Gaussian reference measure IIo 
will correspond to a Brownian motion (thus, boundary condition x(0) = 0) or a 
Brownian Bridge (x(0) = x(£) = 0). Recall here the specification of the covari- 
ance operators C bm , C bb and Cameron-Martin spaces Hq, Ti.^ of a Brownian 
motion and Brownian bridge respectively in Section [5J We make the following 
definitions, for the relevant range of subscripts: 

a m = ^ — (x tl ,xt 2 ,...,x tM ) ; pi = ^-(/i,/ 2 ,---,/l) ; 



dj to a o \ , 9zi , d 



7j = 777T (Si,S2,-.-,Sj) ; z[ = 



Ta = 



dS 3 y ±} "'"' J ' ' ' dx ' 3 dx ' 

Remark 5.1. With a somewhat abuse of notation, path-elements {C bm 8§(x)}, 
{C bb S$>(x)} found in Provosition \5.1\ below are obtained (at least for the terms 
in $(x) involving stochastic integrals) by recognising that the finite- difference 
N -dimensional algorithm used in practice corresponds to applying the finite- 
difference scheme on the Hilbert- space-valued algorithm employing precisely the 
shown paths {C bm S&(x)}, {C bb 5$>(x)} within it's specification. (So, here 5&(x) 
corresponds to a variational derivative only formally.) This remark applies also 
to a similar result shown in Proposition \7.1\in a subsequent section. 



Proposition 5.1. For the functional $(x) given in |^[ ), for any x G %: 

M 
(C bm S${x)) {u)=Y,a m -(uI[u<t m }+t l I[u>t m }) 

7n— 1 

+ / Pi ■ [u I z[(v,x(v))dv ds — / / z'i(v,x(v))dv dsj 
l=l ^ Ja Jo Jo ' 

+ S 7j • (« ( Tj {£, x{£)) + j dq, («))-/" j dq 3 (v) ds) , ue [0, £} , 



j=i 
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for the integrator 

dqj(v) — r'Av,x(v))dx(v) — drj{v,x(v)) . 

Also: 

M 

(C bb S${x))(u) = ^2a m -(ul[u<ti]+til[u>ti] -uti/l) 

m— 1 

+ ^2 l3l '{~jj / z 'i( v , x ( v ))dvds- / z[{v,x{v))dvdsj 

+ E ^ * (| / f d ^ ^ ~ f f ** ^ ds ) ' u G t°' £ ] • 

Proof. See the Appendix. D 

Thus, in both cases the first terms appearing in the specification of {C bm 5Q(x)}, 
{C bb S&(x)} in the proposition, are continuous and piece- wise linear in u (there is 
a turn at the time instances of the observations) so still lie within the Cameron- 
Martin spaces Hq , Tiff respectively (even if the variational derivative 5a itself 
will not necessarily lie within the Hilbert space, as shown in the proof). The 
second terms are clearly a.s. elements of the corresponding spaces Hq, Ti bb 
under weak continuity conditions on z[. Finally, for the third terms, again weak 
regulatory conditions on rj and r' guarantee that the corresponding paths in u 
are elements of the appropriate Cameron-Martin spaces. 

5.2. Example Models 

Diffusions Observed at a Discrete Skeleton 

We first consider the case of discretely observed diffusions. In other words, 

suppose that the diffusion V is observed at times {£,-, i = 0, 1, ... , M}, with 

to = and tu — (, and the observations are denoted with: 

Y = {Yi = V u , i = 0,l,...,m} . 

The observations on the transformed unit- volatility diffusion X, defined in (|41|. 
are then: 

y = {Y = rifXa e), i = o,i,...,M}. 

The reparametrisation of [36] involves another transformation that operates on 
each of the m independent constituent bridges. Between observations ij and 
ti + \ we define X t as 

X t = X t + iU+l ~ *?* + { \ \~ U)fi+1 =: b(X t ; 6) , U<t<t l+1 . 

H+l ~ H 
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For ease of exposition, we further assume that an antiderivative of v{-] 9) exists. 
Then, we can write the target log-density \og((dU/dH )(x\yi,y i+1 ,9)) for the 
path X between ti and t i+ i as (up to an additive normalising constant): 

-\J % l [i/(b(x s ;6);e)+is 2 (b(x s ;e);6)]ds, (45) 

where v'(x;9) denotes the derivative of v(x;9) with respect to x and IIo is the 
distribution of a standard Brownian bridge. 

Diffusions observed with error 

In this observation regime the data form a discrete skeleton of the diffusion 
path in the presence of measurement error. V is observed at the time instances 
{ti, i = 1, . . . , M}, with tu = £, with observations Y = {Y^, i = 1, . . . , M}. 
We consider the case of independent observations conditional on the diffusion 
process and work with the unit volatility diffusion X, such that Vt — r] _1 (X t ; 9). 
Without loss of generality we also assume that Xq = 0; otherwise X$ may be 
treated as an additional parameter. The probability density of the data given 
the latent path X is: 

M 
p(y\f 1 - l (x;9);9)=l[f(y l \r ! - 1 (x u ;9y,9) , (46) 

i=\ 

where / denotes the likelihood of the data given V, 9. From, (|46|) . (j42|) the target 
log-density \og((dH/dHo)(x\y,9)) is (up to an additive normalising constant): 

Y J [\ogf(y l \ir 1 {x u ;9)-9)+ f "' u(x a ; 9)dx s - i [^ v 2 (x s ;6)} 

where now IIo denotes the distribution of a standard Brownian motion. 

Stochastic volatility models 

Stochastic volatility are popular in modelling financial and econometric time 
series and instruments such as asset prices, interest rates and financial deriva- 
tives. In standard stochastic volatility models for asset prices, such as those in 
[26( and [25J , the log-price S is a diffusion whose volatility is driven by another 
diffusion. The SDE of the bivariate diffusion model has the following form 



dS t - H.(V t ;8)dt + pv a (y t ;6)dWt + ^T^7a s (V t ;9)dB t ; 
dV t = Hv(V t ;9)dt + a v (Vf,9)dWt , 

where B and W are independent Brownian motions and p reflects the leverage 
effect. For ease of exposition, we set p = 0. Stochastic volatility models with 
leverage effect are still into the framework of this paper and can be handled 
with the methodology described in Section [7J The Lamperti transform can be 
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applied on V to obtain the unit volatility diffusion X as in (|4ip . Without loss of 
generality we assume Xq = and a pair of observations, with yo, yi denoting the 
value of S at times 0, t\ respectively. The distribution of the data conditional 
on the path of X from to t\ has a known closed form and we can write: 



(48) 



yi \y ~ N(M y (x;9),V y (x;9)) ; 

M y (x; 9) =y + J 1 fi s (x s ;9)ds ; 

V v {x\9) = Jo* 1 a x (x s ;6) 2 ds ; 

x t = f]~ r {vt',9) ; 

dX t = u(X t ; 9)dt + dW t , < t < t x . 

Cases with more observations are handled by splitting the data into pairs of con- 
secutive points as above and then, using the Markov property for the bivariate 
diffusion (S, X), multiplying the corresponding densities calculated according 
to l[4"8"]). The log-density \og((dU/dU )(x\y,e)) for the latent diffusion X that 
drives the volatility for < t < t\ is (up to an additive normalised constant): 

rti 1 rti 

v(x s ]9)dx s - - I v 2 (x s - 1 9) , 



jVyiSlA 



[ yi -M y (x t ,8)y 
2V y (x t ,6) 



where IIo denotes here standard Brownian motion. 



Latent Diffusion Survival models 

Survival models target the probability of an individual i surviving up to time t 
or else P(Y > t), where Y denotes the event time. They are formulated based 
on the hazard function h(t) that reflects the probability that an event will occur 
in the infinitesimal period [£, t+dt). Under latent diffusion models ([l|, [35|), the 
hazard function is assumed to be a positive function h(-) of a diffusion process. 
For ease of exposition we work with the unit diffusion X defined by (|4"Tj) and 
assume Xq = 0. The distribution function for a single observation j/j is then 
given by: 



F (Vi) = 1 - exp ( 



u 



h(x s ;9)ds) , 0< yi <£ 



This formulation can also be interpreted as a random barrier hitting model. 

(Yi, . . . , Y n ), with Y n < t, can be 



The likelihood for observed event times Y 
written as: 



f(v\- 



n 

\Y[h{x Vi ;l 



i=l 



exp I —2, / h(x s ;9)ds 



(49) 



Hence, the log-density log((<fn/<ino)(a;|y, 9)) for the latent diffusion X becomes 
(up to an additive normalising constant): 



n 

i=1 >■ JO 'JO *JQ 

with IIo denoting the distribution of a standard Brownian motion as before. 



J log h (x Vz ; 9) - / h(x s ;9)ds\+ v(x s ; 9)dx s - - / v 2 (x s 
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6. Numerical Applications 

In this section, we employ the algorithms of Table [5] to perform various sim- 
ulation experiments involving diffusion bridges, stochastic volatility and latent 
diffusion survival models. In all these experiments, we treat the parameter vec- 
tor 6 as known and focus on the update of the latent diffusion path. The aim 
is to assess and compare the performance of the algorithms on various aspects 
including efficiency of the MCMC output and central processor unit (CPU) 
time. In order to measure CPU time in two different computing environments, 
the simulations for diffusion bridges and stochastic volatility models were car- 
ried out in MATLAB, whereas for the latent diffusion survival models the C 
programming language was used. The relative efficiency of these methods was 
measured through the effective sample size ESS = (f + 2 ^ fc>1 7(fc)) _1 , with 
Efe 7(^0 being the sum of (in practice) the first K sample auto-correlations 
(this will be estimated from the samples, as e.g. in [19]). A similar approach 
was also taken in [22J where the minimum ESS, taken over a number of the uni- 
variate MCMC trajectories, was used. In our context, the MCMC performance 
is assessed by monitoring the posterior draws of the diffusion, recorded at a fine 
partition of it's path, and reporting the minimum ESS over these points. The 
number K was set to a high enough value so that the minimum ESS, for a large 
enough number of iterations (set to 100,000) stabilises for all algorithms. The 
value of ESS was multiplied by a factor of 100 to reflect the percentage of the 
total MCMC iterations that can be considered as independent draws from the 
posterior. 

The employed MCMC algorithms consist of an independence sampler propos- 
ing from the reference Brownian path IIo and the advanced algorithms in Table 
[21 The algorithms were tuned to achieve certain acceptance probability levels 
that, according to our experience and previous literature, are associated with 
better performance. Specifically we aimed in attaining an acceptance probabil- 
ity around (15% - 30%) for RWM, (50%-70%) for MALA and (65%-85%) for 
HMC. To explore the performance of HMC we first fixed the number of leapfrog 
steps (e.g. to 5 or 10) and then recorded the minimum ESS for various levels 
of acceptance probability. We then considered cases with additional leapfrog 
steps. For each of these algorithms, we monitor the values of the minimum 
ESS, CPU times and their ratio in absolute and relative terms. The presented 
results contain the best version of these algorithms. 

6.1. Diffusions Observed at a Discrete Skeleton 

Consider the diffusion discussed in Section [4j i.e. an OU process with SDE: 

dX t = -nX t dt + dB t , < t < £ , 

with Xq — and an observation at time £ = 1 . We set X\ — and consider 3 dif- 
ferent values for k, i.e. 12, 20, 30 in our investigation of the MCMC performance. 
The algorithms were constructed using the Euler-Maruyama approximation of 
the target density (H5"j) from a time-discretised diffusion path. The MCMC 
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components comprise of the equidistant points from a discrete skeleton of the 
diffusion. The discretisation step was set to S = 0.02. Table [4] provides the 
results, i.e. the values of the minimum ESS, CPU times and their absolute and 
relative ratio. The HMC algorithm consisted of 5 leapfrog steps with the param- 
eter h set to values (0.43, 0.26, 0.17) for values of k (12, 20, 30) respectively. For 
advanced MALA, that can be thought as HMC with a single leapfrog step, the 
corresponding values of h — \/~Ai were very similar (0.45, 0.26, 0.18) indicating 
much smaller total steps. Overall, advanced HMC consistently over-performs, 
in terms of ESS, the remaining algorithms. In particular for k — 30, HMC is 
faster than the independence sampler by a factor of over 30. It's performance 
remains at high levels as we increase k and does not deteriorate as 5 becomes 
smaller, as indicated by the results obtained for S = 0.01 and S = 0.005. In line 
with the results of [39[ and Section [4) we note a substantial improvement over 
advanced MALA suggesting a more efficient use of the gradient within HMC. 
MALA offers some improvement over RWM and independence sampler, but at 
a heavy additional computational cost. The independence sampler performs 
reasonably well for n = 12 (acceptance rate of 16%) but it's performance drops 
substantially as k increases and the acceptance rate becomes smaller; 8% for 
k = 20 and 1.2% for k = 30. 



K= 12 




min(ESS) 


time 


min(ESS) 
time 


relative mi " (ESS) 

time 


IS 




3.9173 


4.8811 


0.8025 


2.1733 


RWM 




3.9584 


5.8925 


0.6718 


1.8192 


MALA 




4.0112 


10.8626 


0.3693 


1 


HMC 




35.7274 


20.8695 


1.7119 


4.6361 


HMC (5 = 


= 0.01) 


35.8903 


32.5594 


N/A 


N/A 


HMC (S = 


= 0.005) 


35.5875 


51.6085 


N/A 


N/A 


k = 20 




min(ESS) 


time 


min(ESS) 
time 


relative mi " (ESS) 

time 


IS 




0.5013 


4.4977 


0.1115 


l 


RWM 




1.0086 


5.4445 


0.1853 


1.6621 


MALA 




1.6202 


10.0588 


0.1611 


1.4452 


HMC 




26.6214 


20.8841 


1.2747 


11.4369 


k = 30 




min(ESS) 


time 


min(ESS) 
time 


relative mi " (ESS) 

time 


IS 




0.1012 


4.7043 


0.0215 


l 


RWM 




0.4343 


5.7229 


0.0759 


3.5277 


MALA 




0.5372 


10.0438 


0.0535 


2.4863 


HMC 




13.3350 


20.4831 


0.6510 


30.2631 



Table 4: Relative efficiency via the minimum ESS (%) and CPU times (seconds), for the 
advanced pathspace algorithms - Case of OU bridges. IS denotes the Independence Sampler. 
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Sampler 


min(ESS) 


time 


IOC 


v min(ESS) 
time 


relative mi " (ESS) 

time 


RWM 


0.1400 


161.8298 




0.0865 


1.3561 


MALA 


0.2181 


341.8737 




0.0638 


1.0000 


HMC (5 steps) 


2.5695 


689.6767 




0.3726 


5.8400 


HMC (10 steps) 


8.1655 


1188.1201 




0.6873 


10.7729 


HMC (20 steps) 


8.3216 


2200.1311 




0.3782 


5.9288 



Table 5: Relative efficiency, via the minimum ESS (%) and CPU times (seconds) for the 
diffusion pathspace algorithms - Case of stochastic volatility paths. 



6.2. Stochastic Volatility Models 

The following stochastic volatility model was used to simulate data: 

dS t = exp(V t /2)dB t , < t < £ ; 
dV t = k(h - V t )dt + adW t . 

The parameters were set according to previous analyses based on similar models 



for the S&P 500 dataset [l2|. Specifically, we set k = 0.03, fj, = 0.07, a 2 = 0.03 
and Vo = 0. We consider about a year measured in days (£ = 250) and recorded 
observations at a daily frequency (250 data points). The transformation of Vt to 
a unit volatility diffusion was utilised to write the target density and construct 
the HMC algorithms. The model for a pair of consecutive observations, (j/i-i, y£) 
can be written as: 

Vi\yi-i ~ iV(j/»_i, f**_ x exp(ax s )ds) ; 
x t = Vt/o- ; 
JX t = k (f - X t ) dt + dW t , t <t<h. 

The results are shown in Table [5] Independence sampler performs very poorly 
in this case, with an acceptance rate below 10 -4 , and is omitted from the ta- 
ble. MALA provides a small improvement over RWM which is nevertheless 
not enough to cover the associated increase in the corresponding computations. 
However, this is not the case for HMC that reaches it's optimal performance 
roughly at 10 leapfrog steps. Advanced HMC offers considerable improvement, 
being nearly 8 times faster than RWM and 11 times faster than MALA. Param- 
eter h that corresponded to the desired acceptance probability levels was 0.085 
for MALA algorithm and 0.075 for all the versions of HMC. 

6.3. Latent Diffusion Survival models 

This section provides a numerical illustration in simulated data from a latent 



diffusion survival model appearing in [35] . Specifically, the likelihood for event 
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Sampler min(ESS) time 100 x mi " (ESS) relative min(ESS) 



time time 



RWM 0.1039 55.2342 0.1881 1 

MALA 0.6466 87.5021 0.7389 3.9284 

HMC 25.2985 248.0301 10.1997 54.2229 

Table 6: Relative efficiency, via the minimum ESS (%) and CPU times (seconds) for the 
advanced pathspace algorithms - Case of latent diffusion survival model. 

times Y — {Yi, . . . ,Y n } is given by: 

V {y\^\x-B)-B) = [II< ex P(-E/ x2 s ds 
»=i i=i ^° 

where we have considered the model: 

dX t = -(lAsm(X t )dt + l)dt + dB t , X = 2 . 

Table [5] provides the measures of performance of the algorithms in Tabled The 
calculations in this section were obtained using C programming language, unlike 
the previous two applications where MATLAB was used. Similarly with the 
stochastic volatility simulation experiment, independence sampler is associated 
with extremely low acceptance rate rendering it infeasible in practice. RWM also 
performs poorly. A very small step is required to achieve the desired acceptance 
rate, thus resulting in very small moves around the diffusion pathspace. MALA 
with h = 0.2 performs better in this case, but massively better performance 
is offered by advanced HMC. Specifically, HMC with 10 leapfrog steps and 
h = 0.15 is about 54 times faster than RWM. Figure Q] depicts the trajectory 
of X t , determining the hazard function, that was used to generate the data. 
The figure displays 95% pointwise credible intervals obtained from the HMC 
algorithm appearing in Table [6] 

7. Robustness in Dimension in Further Scenarios 

We will briefly illustrate the well-definition of advanced HMC algorithms 
for cases when the target distribution of the SDE-driven models can have a 
structure different from the ones considered so far. In what follows, we omit 
reference to the parameter 9. 

7.1. Beyond the Lamberti Transform 

In the context of a non-scalar diffusion Vt , defined via the equation (J5J| , it is 
not at all guaranteed that V can be transformed into an SDE of unit diffusion 
coefficient. Indeed, the Lamberti transform in such a context would look at the 
existence of a mapping Vt i-> rj{X t ) (with r] = (rji, 772, • • • , T]d) T '■ R d | — > R d ) such 
that, for all v in the state space of Vt : 

Dr)(v) ■ a(v) = I d (50) 
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Figure 1: 95% Pointwise credible intervals (blue dashed lines) overlayed on the true path of 
X (red solid line). 



where Dn = {drji/dvj). This follows directly by Ito's formula, see e.g. [2j. The 
work in [2J also shows that existence of a mapping r\ with the property (1501) is 
equivalent to the diffusion coefficient matrix satisfying dcr~- /dvk — ddik/dvj 
for all i,j, k with j < k. This restricts considerably the applicability of the 
Laberti transform for non-scalar diffusions. There are however other methods 
suggested in the literature of a wider scope. Our advanced MCMC algorithms 
require posterior distributions on pathspace which are absolutely continuous 
w.r.t. Brownian motion-related distributions, so it is of interest to briefly verify 
their well-definition on the pathspace when using such alternative transforms. 
The method suggested in [l2| maps V t onto the driving Wiener noise X t = B t 
of the SDE. Assuming some data Y with likelihood p(y\v), and since the prior 



on X is simply the Wiener measure n = N(0,C ), we can write the posterior 
on X as: 

—— (V) (xp(y\v) . 
cfllo 

It remains to calculate C bm 8<&(x) is this context. Differentiation of $(x) will 
involve finding derivatives of v w.r.t. the driving noise x. So it is not a surprise 
that the dynamics of the Malliavin derivative D s Vt (see e.g. [l8() will appear in 
the calculations; this is defined as below: 



dYt 

Y t 



n'(v t )dt 

D s V t = 



cr'(V t )dB t , Y 

a(V s )I[s<t] . 



1 



We will assume here the general structure for $(x) = — \og(p(y\v)) (compared 
with the structure assumed earlier in (1441) we do not include here stochastic 
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integral terms to avoid overly cumbersome expressions): 

$0) =$ v (v(x)) =a(y(ti),v(h),---,v(t M )) 

+ /3( / zi(s,v(s))ds, / z 2 (s,v(s))ds, / z L (s,v(sj)ds) . (51) 
Jo Jo Jo 

The terms a m , /3; appearing below correspond to partial derivatives of the func- 
tionals a, j3 as in the case of Proposition 15. II 

Proposition 7.1. For i/ie functional $(x) given in i51\), for any x G %: 

M , ruAt m 



F dr 



(C bm S$(x))(u) = ^ a m • ( (uAt m ) (F m>tm + er(tO) - 

m— 1 

+ ^ A • (« (G M + Ai) + I (G Lr + Ji, r )dr) , u e [0, £] , 
/or t/ie processes, for m — 1, 2, . . . , M and ? = 1, 2, . . . , L: 

Fmr = [ r e jt™ (n>{v u )duW{v u )dx^) dQs . 

Jo 

G Ur = r f ^(*.«t)e / »^' K)du+CT ' K)dx "^^Q s ; 

Jl,r = / «/(s,U s )o-(u s )ds , 

Jo 
with integrator: 

dQ s = cr(v s )(n'(v s )du + a'(v s )dx s ) - da(v s ) . 

Focusing on the properties of the calculated path C 5<&(x) over it's domain 
u £ [0,£], it is easy to check the following: a.s. in x, the first terms are con- 
tinuous, piecewise linear with a turn at the data instances t\,t2, ■ ■ ■ , £/wfi then, 
under the weak assumption that the processes r H ► G^,,, r H ► J; r are a.s. con- 
tinuous, we have that the second terms in the calculation in Proposition [7J] are 
a.s. differentiable. Thus, under weak conditions Assumption 13 . 1 1 is satisfied and 
advanced HMC is well-defined on the pathspace. 

8. Discussion - Further Directions 

In this paper we developed and applied a general class of derivative driven 
algorithms, suitable for Hilbert spaces defined by diffusion driven models. The 
validity of the algorithm was established using different theory than [6| , which 
allowed us to relax involved assumptions and generalise to widely used diffusion 
models across various scientific fields as mentioned in Section 11.11 The mod- 
elling framework contains diffusions observed with error, partial data, as well as 



30 



observation on functionals of the diffusion process such as hitting times and in- 
tegrals thereof. The extended framework also includes diffusions processes with 
general diffusion coefficient that can can be handled with reparametrisation to 



the driving Wiener noise as in [12j and [24j . A number of research directions 



related with the methods presented in this paper have remained open: 

• The algorithm can be used in contexts where a Gibbs data augmentation 
scheme is adopted to facilitate the step of updating the diffusion path 
given the parameters. The entire diffusion path can either be updated 
in a single step, but can also be integrated to updates based on overlap- 
ping and potentially random-sized blocks. Note that while this approach 
will boost the conditional step of the diffusion path updates, another im- 
portant aspect for the overall performance of the MCMC sample is the 
posterior correlations between diffusion paths and parameters. The role 
of reparametrisation of diffusion paths is critical for this task. We are 
currently investigating extensions of the advanced HMC algorithm on the 
joint density of diffusion and parameters. In the numerical applications of 
this paper the high dimensional diffusion path was successfully updated in 
a single block, a fact that provides supporting evidence that the algorithm 
could still be very efficient when also incorporating the low dimension pa- 
rameter vector 9 in it. Such an approach will provide an alternative to 
pseudo marginal 4] or particle MCMC [3( formulations for diffusion driven 
models; see e.g. [37J and [iff respectively. 

• In this paper the choice of the mass matrix for the Hamiltonian dynamics 
is guided by the prior, i.e. by the reference Gaussian law 7V(0, C). Re- 
cent work in the literature ([23]) has looked at exploiting the geometric 
structure of state-space to consider location-specific mass matrices. It is 
therefore worth investigating the choice of location-specific mass matri- 
ces in the context of infinite-dimensional pathspaces to further boost the 
efficiency of MCMC methods. 



• 



An important direction of further research is related with the calibration 
of long-memory diffusion models: an important class covers the case of 
processes driven by fractional Brownian motion (see e.g. 38l|31(). Apply- 



ing standard Independence samplers over smaller blocks of the complete 
latent path will be overly expensive as calculations have to be done every 
time over the whole of the path due to the lack of Markovianity. In this 
case it could be very beneficial to update the whole path together using 
HMC and control the acceptance probability by trying different leapfrog 
steps. 
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Appendix A. Proofs 

Proof of Proposition \4-l\ We use the notation \-\l 1 = E | • | . For both algorithms 
(RWM and MALA) the acceptance probability can be expressed as: 

a(x,v) = lAe flM 

for the relevant variable R(x,v). Working as in [3PJ, for the first result it 
suffices to prove that sup f \R(x,v)\l 1 < oo since for any A > |-R|li we have 
the inequality: 

E[lAe R ] > e - A (l-!i^ii-) . 

For the second result, we will be identifying a term in R with the highest order 
Li-norm, which we will denote by J. Crucially, this term will lie in (—00, 0), thus 
will be growing to —00 faster than the growth of \R— J\l x - This provides some 
intuition for the zero limit of the average acceptance probability. Analytically, 
as in [39(, we will be using the following inequality, for any 7 > 0: 

E[lAe B ] <P[i?> -7]+e~ 7 

= P[{R>- 1 }D{\R-J\ <j}]+P[{R>-j}D{\R-J\ > 7 }]+e~ 7 
<P[J>-2 7 ]+P[|i?- J\ > 7 ]+e" 7 

<P[J>-2 7 ] + |j *~^ |l ' +e-" f , (A.l) 

where we used the Markov inequality in the last line. So, the idea here will be 
to choose some 7 growing to 00 faster that the rate of growth of \R — J\l x and 
slower than the rate of growth of | J\l x - 

From [39], we retrieve the following collection of results: 

fm 2 — e 1 ■ f irvi 2 — i 4 * - ^ a i ■ 

J -M'*'l ' 2Ktanh(«£) 2k 1 ' ^ l l "*'l " 90k 1 2kF 6kJ ~ 2k 5 tanh(«£) ' 



'2 C 2 



E[(X,CX)] - -±§I 2 ^tanh( K £) ' ( A " 2 ) 



E[(.t,C 3 x)] 



3_\ 1 _ 945+315k^£^-21k 4 £ 4 +2k 



2*>2_ 01 „4»4 , o„6»6 



1890re 8 2k 7 tanh(ic£) ' 

(1 • E[(v,Cu}] = |o ; 



E|«| 2 = ^; E[( Vj C«)] 



E[(x,«) 2 ] = ^l- 5 ^i E(ay ; E[(CM)»] = ^^fW ; 

T^T/^2^ „,\2] _ 467775+155925k 2 1 2 -10395k 4 1 4 +990« 6 1 6 -99k 8 1 8 + 10k 10 1 10 /a q\ 

£j[{L X,V) j - 467775k 12 ■ t A -^ 

For the case of RWM, analytical calculations will give: 

R ( x > v) = 2(1 ^L )2 At (x, x) - 2(1 +L )2 At («, w) - K (1 ( JLy 2 } V&i (x, v) . 

Now, when At = c/l 2 , from the results in (|A.3[) we find that the Li-norm of 
each of the three above summands is 0(1), so sup ; \R(x,v)\l 1 < oo. When 
At = c/£ e with e € (0, 2), a careful inspection of (IA.3I) gives that the term with 
the highest order Li-norm is: 
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Indeed, we have that: 

\J\ Ll = 9(A« 2 ) ; \R - J\ Ll = 0{VAi£) . 

We will apply (|A.1|) for the selection 7 = (At£ 2 ) 2 / 3 . Clearly, the last two terms 
in the R.H.S. of (|A.1[) vanish as £ — > 00. For the first term we need to use the 
analytical definition of J. Using the rescaling properties of the Brownian bridge, 
we can write Va = \ivt f° r a standard Brownian bridge v on [0, 1]. Thus, we 
can re- write: 

(v, v) = f v 2 dt = £ f v 2 e dt = £ 2 f i 2 dt = £ 2 \v\ 2 . 
Jo Jo Jo 

We now have that: 

P[J>-2 1 ]=P[\v\ 2 <^±^^]^P{\v\ 2 = Q} = 0. 
For the case of MALA, some tedious calculations detailed in [39( will give that: 

R & V ) = 8 (i+A«)2 Af2 («. *) - 32(1+V) 2 A ^ (C:C ' CX) " 8(l ( +"ff Ai ' <*' CjC > 
*2/~ r>„A k 6 A+3/^2^ /i„\ , K 4 (l-^) a. 2. 



W AH (x, Cx) - "a^ AH p, Car) + " Ia/J AH (Cz, Cz) 



8(1 + ^) 2 \"' ' 32(1 + ^)2"" r "' / ' 8(1 + 

+ g(iTW Af5/2 (C:E ' w) _ Sfew Ai3/2 (Cx ' v) + W^W At5/2 (c2a; ' v) ■ 

Now, when At = c/£ 2 , from the results in (JA.3I) we find that the Li-norm of 
each of the twelve above summands is indeed 0(1), so sup ; \R(x,v)\l 1 < 00. 
When At = c/£ e with e € (0,2), using again the results in (|A.3|) . we identify, 
among the twelve terms in the definition of R(x, v), the one with the highest 
Li-norm: 

J=- 32(1 h l ¥)2 At 3 (C 2 x,Cx) . 

Indeed we have that: 

\J\ Ll = e(At 3 £ 6 ) ■ \R - J\ Ll = 0{At b ' 2 £ 5 ) . 

We note that the second result above comes from the bound on the Li-norm 
of the last summand in the definition of R{x,v). In this case we apply (|A.1[) 
under the choice 7 = (At^ 2 ) 11 / 4 . As before, under this choice the last two terms 
on the R.H.S. of (jA.ip vanish as £ — » 00. For, the first term we must use the 
analytical definition of J from above. To handle J we can use the Karhunen- 
Loeve expansion of Gaussian measures. In particular, using the representation 
for an OU-bridge from (l36t we get that: 

-A 1 £ 6 

(x,C x) = 2^ -g-g £ 2 . 

p=i ~p- + « ** 
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Thus, wc can now obtain: 



oo 

P[J>-2 7 ] = P[£^^^ p 2 <^ 



>-f 



! 2 v e p 6 Q, p i K 6 (Atl 2 ) 1 / 4 - 

P =l ** 

< P [^ ££ < ^#^ (S^ttt] -> P [& < 0] = . 
The proof is now complete. □ 

Proof of Proposition \4-2\ We exploit the representation of ^ in (|38|) . Recall 
that we denote (xt,Vi) = ^(xo, «o), for number of leapfrog steps < i < I. 
After tedious calculations, one can obtain: 

AH(x , v ) = H(x I ,v I ) - H(x ,v ) = (Ax , X ) + (Bv , v ) + (gx ,v ) (A.4) 

for the operators: 

A= -\ sm 2 (I9)V ; B = ± sin 2 (19) a 2 V ; 5 = 5 sin(2I0) a? ; 

V = K 2 I + (l-^)(C bb )- 1 . (A.5) 

We have used operator's notation: sin(/#) is the operator such that we have 
sm(I9)x — {sm(I6p) x p } p x L 1 where {x p } are the co-ordinates of x £ H w.r.t. 
to the orthonormal basis corresponding to the eigen- functions of C bb ; a similar 
interpretation stands for the operator sin(2/0). Also o?x — {a 2 x p } p % 1 . The 
sequences {9 P } and {a p } are defined in ((39]). 

We denote here by C the covariance matrix of the target OU bridge; the 
Karhunen-Loeve expansion (J36J) implies the eigenstructure {\ P ,oUi4 l p} P x Li f° r 
C ou with eigen- values: 

K,OU = „ 2p2 - ■ 

~~p r K 

Examining the eigenvalues {Ppjp^i olV, after some calculations one can verify 
that: 

v^c n hc—4 h2 



-OU (2+ ^)(l + p) ' 

so that: 

0<P P <M\;i ou A p ±, (A.6) 

for some constant M > 0. Again, after some tedious calculations we get that: 

a, p = \ \,ov c p ; C P : = ( ^3^2 ~ 4„2 7r 2 ) • ( A - 7 ) 

Note that the term on the R.H.S. of this last definition of c 2 is guaranteed to 
be positive due to condition (1371) . In particular we have that: 

c 2 p <M , (A.8) 
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for a constant M > 0. Now, starting from (IA.4|) we have that: 

E [ (AH) 2 ] = E [ (Ax , x ) 2 ] + E [ (Bv Ql v ) 2 ] + E [ (Qx Q , v a ) 2 ] 

+ 2E[(Axo,x )]V[{Bvo,vo)] (A.9) 

as the rest of the expectations will be zero. Henceforth, {A p }TL lt {B p }^ =1 , 
{G p } p < L l denote the eigenvalues of the operators A, B, Q respectively. Recalling 
that (Axq,Xo) = J2^Li A> x o, P i we have: 

E [ (Ax ,xq) 2 } = Var [ (Ax , x ) } + E 2 [ (Ax ,x )} 



p— 1 p— 1 



ou ) 



Doing similar calculations also for the rest of the terms on the R.H.S. of (|A.9|) . 
we have: 

oo oo oo 

E [ (AH) 2 } = 2 J^ K \l ou + 2 J2 B p K + ( E ( A p X p,ou + B p X p ] 

p— 1 p— 1 p — 1 

OO 

+ 2_^ *~*p ^Pi° u ^p ■ 



Thus, using the representations and bounds in (|A.5|) . (|A.6j) . ()A.7|) and (|A.8j) . 
we have that: 

oo oo 1 oo 

E A p A p^ < M Y, Klu x l ¥ %,ou = M E ^1 < °° ■ 

p— 1 p— 1 p— 1 

Similar calculations will give: 

oo oo ^ oo 1 

Y. B lK<uY.K 2 ^ov4Khu^ x2 ^ M ^lh < ™- 

p— 1 p— 1 p=l 

and: 

oo oo ^ oo 1 

G p \ P ,ou A p < M 2_^ V \,ou c p X pOU X p — X P: ou X p < M ^ ztzz < °° 

P=l p— 1 p— 1 

Finally, we have that: 

oo oo 

I E ( A p X p> ou + B p X p ) I = 2 I E sin2 ( /6 W P p (-Ap,Oi7 + a 2 p Xi) | 

p— 1 p— 1 

OO 



= i| ^sin 2 (^ p )P p A p ,oc/(c 2 -l) 
p=i 

oo 1 oo 1 

< m E a p^ = m E^3 <o ° 



p=l p=l y 
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Thus, we have proven that sup ; E [ AH 2 ] < oo, which, as illustrated in the proof 
of Proposition 14.11 is sufficient for our proof. 

□ 



Proof of Proposition \5.1\ We will do our calculations using the right- most ex- 
pressions in ((4]), ([5]), where C bm , C bb are respectively specified. For the first term 
in the expression for $, namely a = a(x(ti), xfo), • ■ • , #(£m))j we can formally 
write: 



M 



(Sa)(s) — ^J a m ■ 6 ti (ds) 



where St t is the Dirac measure centered at ij. Applying C bm and C bb will 
give immediately the terms in the first lines of the expression for C bm 8$(x) 
and C bm 5<&{x) in the statement of the proposition. For the second term /3 = 

p(h,i 2 ,...,i L y, 



L 



(5f3)(s) = }^(3 l -z' l (s,x( s )) . 
i=i 

Again, applying C bm and C bb will give the terms in the second lines of the 
expression for C bm 5&(x) and C bm 5&(x) in the statement of the proposition. 

We proceed to the term 7 = "y(Si, S2, . . • , Sj) with the stochastic integrals. 
The algorithm applied in practice will involve a finite-difference approximation 
of the stochastic integrals {Sj}. Below we will sacrifice accuracy at the notation 
to avoid taking too much space for what involves otherwise straightforward 
derivative calculations. Consider the discretised time instances = So < Si < 
• • • sn-i < sn — £■ Denoting three consecutive discrete time instances among 
the above by s_ < s < s+, the finite-difference approximation, say S^ , of Sj 
can be written as follows: 

s£{si,...,sn} 

We can now calculate the partial derivative of Sj w.r.t. to the one of the N 
variables, x(s), making up the discretised path. Notice that x(s) will appear in 
two terms in the summation, unless it is the last point x(sn) of the x- vector 
when it will only appear once. This explains the following calculation of the 
partial derivatives: 

^-p- = Ag(s), s €{si,...,Sjv_i} ; g , 3 . =r j (s N -i i x(a N -i)) , (A.10) 

where we have defined: 

Aq(s) = r / (s,x(s))(x(s+) - x(s)) - (r 3 (s,x{s) - rj(s-, x(s-))) . 

Overall, we have that: 

9 7 _ ^ dSf 

diV) = f^i 13 ' diV) ' (A,11) 
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Then, the N x N discretised covariance operator C bm — (min{sj, Sk})i,k, corre- 
sponding to the covariance matrix of a standard Brownian motion at the time 
instances Si,S2, • • . ,Sn (this is the discretised version of C bm ), can be easily 
shown to apply as follows to a finite-dimensional vector / £ M. N : 



N 



(C'^/)„ = s„-(X>)-E(I^) Asfe + 1 ' «=1 > 2,...,JV, (A.12) 



i=l fe=l i=l 

where As k+1 = Sk+i - s k . Combining (|A.10|) . (|A.11|) . (|A.12[) will give (with V 
denoting the gradient): 

(C 6m V 7 )„ = 

J JV-1 «-l fc 

^7j • (s«(r :/ -(sjv-i,a;(sAr-i))+ XI A <?( s 0) + X! (XI Aq( Sl ))As k+1 J 

j=l i=l fc=l i=\ 

which can now be recognised as the finite-difference discretisation of the term 
appearing in the third line of the expression for C hm 8Q{x) in the statement of 
the proposition. A similar approach for the Brownian bridge case, considering 
the discrete time instances = sq < s\ < ■ ■ ■ sjv-i < sjv < s^ + i = £ , and the 
corresponding iV-dimensional matrix C bb represented as below: 

N k u-1 k 

(C 66 /)u = ^-X(X/«)A Sfe+1 -X(X/ 4 )A Sfc+ i, u = l,...,N, (A.13) 

k=l i=l fc=l i=l 

where now: 

c/ i oa;(s,) 

s£{si,...,s N + i} 

for Aq(s) as defined earlier and 1 < i < N. This will give the calculation: 

,7 TV fc u-1 fc 

( ctt H^E7r(y(E(E A ^)) A ^i-E(EA^))A SM 



j=l fc=l i=l fc=l i=l 

immediately recognised as the finite-difference discretisation of the term ap- 
pearing in the third line of the expression for C bb 8Q(x) in the statement of the 
proposition. 

□ 

Proof of Proposition \7.1\ Consider a collection of discrete time instances < 
s\ < S2 < ■ ■ ■ < sjv with so = and sat = i that include the data instances: 

{h,t 2 , ■ ■ ■ ,t M } C {si,s 2 ,. . . ,s N } . 
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Let Asi — Si — Si— i. We will consider the following finite-difference approxima- 
tion $(x) = $(aJi,£2, • • • j xn) of the negative log-density: 

$(») = $i(^) + $ 2 (a;) =a(uj 1 , Vi 2 ,...,Ui M ) (A.14) 

W W W 

+ /?( X! ^i(si-i,Vt-i)Asi,y^ z 2 (si_i, Vi_i)Asi, ...,y^ z L (si_i, Wi_i)AsjJ 

Z — 1 2 — 1 Z— 1 

for indices *i, 12, ■ . ■ Im such that Si m — t mi for m = 1,2,... M, and vector v 
constructed via the finite-difference approximation: 

Vi = Vi-i + n(Vi-i)ASi + cr(w 4 _i)Axi , 

for i = 1,2, ...,N with vq equal to a specified fixed initial condition. We 
will be using the obtained expression in (|A.12[) for the N x N covariance matrix 
C = C bm = (min{s,;, Sj})ij of a standard brownian motion at the time instances 
si, S2, ■ • • , sn- The function $ : K d 1— > R in (|A.14[) and the matrix C fully specify 
a finite-difference approximation of the original target defined on the Hilbert 
space. 

Now, we have the following recursion for the derivatives 

y.._9Vi_ 

for any j > 1: 

*i,3 = *i-i,j + m'( w »-i) ^-i.j As, + o-'Oj-i) ^-1,3 Aa; i ! « > j + 1 
y i+iJ = y w + M>j) Fj,jAsj+i + cr'(^-i) Y,j Ax j+ i - cj{v 3 ) ; 

Yi,j = , i < j . 
So, we can obtain that, for i> j + 1: 

log(Y itj ) = log(yi_ij-) + log (1 + /u / (u i _ 1 )As i + o-'(u i _ 1 )Aa; i ) , 
and using this recursion we get that: 

Y itS = AQ 3 x e E *-*+ a log (i+^(v k ^)As k +a'(v k ^)Ax k ) f i > j + 1 , (A.15) 

r„- = ct(^_i) (A.16) 

r 2J = , i < j , (A.17) 

where we have set: 

AQj = Y j+1J = <t(vj-i)(j/ (vj)As j+ i + <r'{v 3 )Ax 3+1 ) - Aa{v 3 ) , 



HS 



and Aa(vj) = cr(vj) — ct(«j_i). We will also define for 1 < m < M and 
1< I < L: 



F - V Y 

± m,r — / 1 i m ,3 5 



r < «,-, 



j=i 



Gy = ^ ( ^ z'i(8i,Vi)Aa i+1 ) Y itj ; 

r 
Jl,r = Y Z l( S h V i) Y 3,3 As i+i • 

i=i 

The above sequences will appear at the calculation of partial derivatives of 
$(z). It is important to notice here that these sequences indeed constitute a 
finite-difference approximation of their continuous-time counterparts appearing 
at the statement of the proposition: to sec that one only need to look at the 
analytical definition of Yij in equations (|A.15I) - (|A.16|) . and realise that the 
sum J^k 1°§ (l + A i '(wfc-i)As / t + cr'(wfc_i)Axfc) is essentially a finite-difference 
approximation of f (p 1 (v u )du+a' (v u )dx u ) as for e « we have that log(l+e) ~ e. 
We can now proceed to the calculation of the partial derivatives of $. 
Clearly: 

9$ = 9£i + 9$2 = y- ,0*i y | d$2 y v 
dxj dxj dxj 4r*. dvi l ' J dut % '' 

We can easily get: 

o^ M 

V*. r ..-y tt Y . 

i>j m—1 

Using (|A.12|) . tedious, but otherwise straightforward calculation will give that, 
for vector index 1 < u < N: 

M uAi m — l 

(CV$i(x))„ = Y a m (s uMm {F m ,i m -i + Y imtim ) + Y F m,k Asfc+i 

m—1 k—1 

(A.18) 
Proceeding to the second term, ^(cc), we have that: 

Y ~g^- ' Y ^J = Y & Y zi(s l7 v l )As l+1 Y itj . 

i>j /— 1 i>j 

We now now multiply with the covariance matrix C to obtain, after some cal- 
culations, for 1 < u < N: 

L u-l 

(CV$ 2 (a:)) T , = 53A(*u(G, >JV + J, lJV )-2(G t i 1 fc + Ji,fc) A »*+i) ■ ( A - 19 ) 
i=i fc=i 
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Upon inspection, (|A.18|) - (|A.19I) provide the proof of the statement of the propo- 
sition. 

□ 
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